📦 Executable Size - abnv · Scour

Intel Binary Optimization Tool Changes Code Execution with Heavy Vectorization 🎯CPU Dispatch

techpowerup.com·2d·

jhammant/Turbo1bit: Turbo1Bit: Combining 1-bit LLM weights (Bonsai) with TurboQuant KV cache compression for maximum inference efficiency. 4.2x KV cache compression + 16x weight compression = ~10x total memory reduction. 🗺️Region Inference

github.com·39m·Hacker News·

I Read a Gzip Decompressor Written in 250 Lines of Rust — and Compression Finally Made Sense 📦Compression Algorithms

medium.com

·5d·

Google's TurboQuant saves memory, but won't save us from DRAM-pricing hell 🗺️Region Inference

theregister.com·23h·

Fujitsu One Compression (LLM Quantization) 📦Compression Algorithms

fujitsuresearch.github.io·1d·Hacker News·

Google Research talks compression technology it says will greatly reduce memory needed for AI processing 💾Cache-Oblivious Algorithms

networkworld.com·28m·

Compression.zstd – Compression compatible with the Zstandard format 📦Compression Algorithms

docs.python.org·3d·Hacker News·

Say "Chameleonicity" Three Times Fast. Or Read This Post. ✨Effect Inference

science.org·3h·

Analyzing Geekbench 6 under Intel's BOT 📅Instruction Scheduling

geekbench.com·2d·Hacker News, r/hardware·

Overview of Content Published in March 🪄Syntax Macros

blog.didierstevens.com·22h·

Geekbench says Intel BOT rewrites benchmark code, Geekbench 6.7 will detect optimized runs 🔮Speculative Execution

videocardz.com·2d·

Archive Format Guide 2024: ZIP vs 7Z vs RAR vs TAR vs GZIP - Complete Compression Comparison 📦Compression Algorithms

luxa.org·3d·

PrismML, which says its 1-bit LLM achieves radical compression without sacrificing performance, comes out of stealth with $16.25M in SAFE and seed funding (Stev... 📏Linear Types

techmeme.com·2d·

TIL: Quantisation ∀Quantified Types

anup.io·5d·

Googles TurboQuant Changes the Economics of Local AI Inference 🗺️Region Inference

medium.com·4d·

Google research cuts LLM memory use by 6x 📊Memory Profilers

kite.kagi.com·6d·

Pure C implementation of the TurboQuant paper (ICLR 2026) for KV cache compression in LLM inference. 🗺️Region Inference

github.com·1d·r/LocalLLaMA·

The Sequence Radar #832: Last Week in AI: Compression, Voice, and Why It All Matters 🏁Language Benchmarks

thesequence.substack.com·4d·Substack·

Google's New AI Compression Could Help Lower RAM Prices - Here's How ⚡Cache-Aware Algorithms

bgr.com·5d·

Will Google's TurboQuant AI Compression Finally Demolish the AI Memory Wall? 💾Cache-Oblivious Algorithms

buysellram.com·6d·Hacker News·

Loading more...